Bioinformatics An Introduction 4th Edition (Jeremy Ramsden)

The Nature of Information

It includes all possible correlations up to length mm. Note that the ﬁrst sum on the

right-hand side is taken over all possible preceding sequences, and the second sum

is taken over all possible symbols. The correlation information is deﬁned as

k Subscript m Baseline equals upper S Subscript m minus 1 Baseline minus upper S Subscript m Baseline left parenthesis m greater than or equals 2 right parenthesis periodkm = Sm−1 −Sm (m ≥2) .

(6.23)

upper S 1S1 is simply the Shannon information (Eq. 6.5). If the probability of the different

symbols is a priori equal, then the information is given by Hartley’s formula (6.4). ²¹

For m equals 1m = 1,

k 1 equals log n minus upper S 1k1 = log n −S1

(6.24)

is known as the density information. By recursion we can then write

German upper T equals script upper S plus sigma summation Underscript m equals 1 Overscript normal infinity Endscripts k Subscript mT = S +

∞

m=1

(6.25)

the total information German upper TT being equal to log nlog n. The ﬁrst term on the right gives the

random component and is deﬁned asscript upper S equals limit Underscript m right arrow normal infinity Endscripts upper S Subscript mS = limm→∞Sm, and the second one gives the

redundancy. For a binary string, upper S equals 1S = 1 if it is random, and the redundancy equals

zero. For a regular string like ellipsis 010101 ellipsis. . . 010101 . . ., upper S equals 0S = 0 and k 2 equals 1k2 = 1; for a ﬁrst-order

Markov chain k Subscript m Baseline equals 0km = 0 for all m greater than 2m > 2.

6.2.1

The Value of Information

In order to quantify valueupper VV , we need to know the goal toward which the information

will be used. V.S. Chernavsky points to two cases that may be considered:

(i) The goal can almost certainly be reached by some means or another. In this

case a reasonable quantiﬁcation is

V = (cost or time required to reach goal without the information)

−(cost or time required to reach goal with the information) .

(6.26)

(ii) The probability of reaching the goal is low. Then it is more reasonable to adopt

upper V equals log Subscript 2 Baseline StartFraction prob period of reaching goal with the information Over prob period of reaching goal without the information EndFraction periodV = log2

prob. of reaching goal with the information

prob. of reaching goal without the information ^.

(6.27)

With both of these measures, irrelevant information is clearly zero-valued.

Durability of information contributes to its value. Intuitively, we have the idea

that the more important the information, the longer it is preserved. In antiquity,

21 The effective measure complexity is the weighted sum of thek Subscript mkm [viz.,sigma summation Underscript m equals 2 Overscript normal infinity Endscripts left parenthesis m minus 1 right parenthesis k Subscript m^Σ^∞

m=2⁽^m⁻¹⁾^k^m^]—see

Eq. (11.27).